Analyzing and Exploring Fault-tolerant Distributed Memories for NoCs
نویسندگان
چکیده
Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining faulttolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for faulttolerance analysis and shared redundancy management of onchip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations.
منابع مشابه
FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links
Three-dimensional Network-On-Chips (3D NOC) are the most efficient communication structures for complex multi-processor System-On-Chips (SOC). Such structures utilize short vertical interconnects in 3D ICs together with scalability of NOC to improve performance of communications in SOCs. By scaling trends in 3D integration, probability of fault occurrence increases that leads to low yield of li...
متن کاملFault-tolerant Routing Scheme for Nocs
A NoC architecture offers high reliability since it has multiple routes from the host to clients. A fault tolerant NoC framework is proposed which achieves maximum performance under fault. NoCs under fault become totally unfunctional. Hence the faulty components are handled separately which ensure that the network is not partitioned and results in high connectivity even under high fault. NoCs a...
متن کاملUsing Peer Support to Reduce Fault-Tolerant Overhead in Distributed Shared Memories
We present a peer logging system for reducing performance overhead in fault-tolerant distributed shared memory systems. Our system provides fault-tolerant shared memory using individual checkpointing and rollback. Peer logging logs DSM modification messages to remote nodes instead of to local disks. We present results for implementations of our fault-tolerant technique using simulations of both...
متن کاملThesis Proposal Compositional Fault-tolerant Distributed Object Systems
Research is proposed into the theory and practice of distributed shared object systems. Speciic points of inquiry are the application of compositional techniques to such systems, and techniques for constructing fault tolerant objects. In particular, we give an object-oriented model of concurrent systems, and show how to support proof reuse by applying existing com-positional proof techniques to...
متن کاملFault-Tolerant Distributed Algorithms on VLSI Chips
The Dagstuhl seminar 08371 on Fault-Tolerant Distributed Algorithms on VLSI Chips was devoted to exploring whether the wealth of existing fault-tolerant distributed algorithms research can be utilized for meeting the challenges of futuregeneration VLSI chips. Participants from both the distributed fault-tolerant algorithms community, interested in this emerging application domain, and from the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013